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FOREWORD 

The  purpose  of  this  study  is  to  provide  a  series  of  computer  subroutines 
which  will  enable  the  user  to  take  a  set  of  patterns  and  expand  them  into  a 
larger  distorted  set.  The  input  to  these  subroutines  will  consist  of  a  binary 
reprec^ntation  of  a  perfect  character  (or  characters).  The  output  will  be  a 
binary  representation  of  the  same  character(s)  with  a  predetermined  amount 
and  type  of  distortion  and/or  displacement. 

A  set  of  programs  of  this  type  may  be  useful  for  the  testing  of  character 
recognition  systems.  It  is  the  object  of  these  recognition  systems  to 
adequately  identify  an  input  character  which  may  be  distorted  in  only  one 
or  in  several  ways.  To  facilitate  the  testing  of  such  a  recognition  scheme 
it  is  desirable  to  have  a  large  variety  of  samples  of  characters.  These 
characters  may  be  of  many  different  fonts  with  varying  degrees  of  serifs. 
For  each  of  these  categories,  samples  should  be  available  with  varying 
amounts  and  types  of  distortion  or  deterioration.  An  adequate  analysis 
along  these  lines  would  necessitate  the  use  of  hundreds  of  sampl..-s.  For  a 
full  alphabet  consisting  of  letters  (both  lower  case  and  capitals),  numbers, 
punctuation,  and  special  symbols,  it  would  be  virtually  impossible  to  make 
all  of  these  up  individually. 

The  subroutines  discussed  in  this  report  provide  a  means  whereby  the 
desired  samples  might  be  obtained  from  one  original  of  each  type.  While 
the  characters  generated  by  these  routines  (used  separately  or  in  various 
combinations)  may  not  correspond  to  a  particular  type  of  distortion  or 
deterioration,  it  is  believed  that  an  adequate  simulation  of  actual  conditions 
will  be  realized. 


TDR.63-44 


ABSTRA  C  T 


This  study  develops  a  set  of  computer  subroutines  which  attempt  to 
simulate  various  degrees  of  distortion  and  deterioration  on  an  input  character. 
A  complete  set  of  the  letters  of  the  Cyrillic  alphabet,  numerals,  and  some 
punctuation  were  quantized  and  punched  to  be  used  as  ideal  input  characters 
to  transforming  subroutines. 

These  transforming  subroutines  permit  the  generation  of  large  sample 
sets  of  characters  containing  controlled  amounts  of  various  types  of  distortion. 
Use  of  the  routines  will  produce  a  set  of  test  characters  in  which  the  pattern 
size,  percentage  distortion  (or  amount  of  shift),  and  number  of  output  samples 
desired  are  parameters.  The  output  of  any  of  the  programs  developed  can  be 
used  as  the  input  to  a  character  recognition  routine.  It  is  the  object  of  this 
study  to  produce  a  variety  of  source  pattern  representations  of  the  same 
pattern.  The  user  can  thus  develop  his  own  series  of  tests  to  determine  the 
identification  criteria  invariance  of  a  recognition  scheme.  The  use  of  these 
programs  is  demonstrated  in  conjunction  with  a  recognition  routine. 
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1.  INTRODUCTION 

Character  generation*  as  defined  for  this  study,  is  the  production  of 
distorted  patterns,  both  random  and  controlled,  with  a  predetermined  amount 
(percentage)  of  deterioration  or  change.  This  distortion  can  manifest  itself 
in  many  ways:  variations  in  the  "weight"  (the  amount  of  ink  the  type  puts  on 
the  paper),  closing  of  holes,  openings  in  solid  lines,  and  spaces  where  lines 
intersect  are  but  a  few.  Vertical  or  horizontal  displacement  of  the  character 
as  well  as  slight  rotation  would  be  classified  as  distortion.  This  study  is 
concerned  with  four  simulations  of  distortion.  Random  unbiased  distortion 
consists  of  randomly  selecting  quantized  cells  of  a  character  and  changing 
them.  This  includes  the  background  (to  supply  "noise")  as  well  as  the  image 
itself.  Random  biased  distortion  randomly  deletes  a  row  or  column  of  the 
quantized  characters  until  the  desired  distortion  is  established.  Numerical 
definitions  with  a  more  elaborate  discussion  of  these  categories  will  be  given 
in  succeeding  sections  of  this  report.  The  other  two  simulations  are  con¬ 
cerned  with  linear  displacement  and  rotation. 

Since  the  purpose  of  this  study  is  to  provide  a  means  of  testing  out 
recognition  devices,  it  is  desirable  to  have  at  least  a  broad  understanding 
of  character  (or  pattern)  recognition. 

Pattern  recognition  has  been  defined^  as  the  assignment  of  a  meaningful 
code  to  a  recognizable  structure.  In  practice,  the  structure  is  represented 
by  a  set  of  signals.  The  signals  are  the  result  of  a  transformation  (i.  e. , 
quantization)  of  the  original  pattern.  There  are  many  varied  solutions  to  the 
problem  of  pattern  recognition,  of  which  several  may  be  acceptable.  The 
best,  of  course,  is  dependent  upon  the  intended  application.  In  most  methods, 
some  form  of  quantizing  takes  place.  Commonly,  quantizing  is  a  process 
whereby  the  original  pattern  is  superimposed  on  a  grid  and  thereby  divided 
up  into  an  array  of  cells.  A  binary  representation  of  the  pattern  is  now  made 
possible  by  assigning  a  "one"  to  each  of  the  cells  in  which  a  portion  of  the 
figure,  above  some  threshold  value,  occurs,  and  a  "zero"  to  each  blank  cell 
(i.  e. ,  no  part  of  the  figure  or  an  amount  less  than  the  threshold  occurs).  In 
some  cases  the  threshold  is  kept  as  a  parameter.  There  are  roughly  three 
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categories  of  recognition  systems.  A  brief  description  of  these  categories 
is  as  follows. 

The  first  is  the  element  matching  system  in  which  selected  cells  (peephole 
matching)  or  the  entire  array  of  cells  of  the  unknown  character  are  matched 
with  corresponding  cells  of  known  characters.  The  matching  can  be  done 
logically  or  statistically,  one  at  a  time  or  in  parallel. 

Secondly,  there  is  wave- shape  comparison.  Using  optical  or  magnetic 
transducers,  a  wave  shape  characteristic  of  the  scanned  unknown  character 
is  obtained  and  compared  with  a  set  of  wave  shapes  of  known  characters.  The 
character  whose  wave  shape  provides  the  greatest  correlation  is  chosen. 
Threshold  values  are  used  to  prevent  inaccurate  identification  by  causing  re¬ 
jection  of  characters  which  provide  excessively  low  maximum  correlation 
coefficients. 

The  third  method  incorporates  a  form  of  characteristic  extraction.  This 
method  is  based  on  the  assumption  that  certain  features  may  be  found  which 
are  invariant  under  slight  distortion.  In  many  characteristic  extraction 
methods,  a  decision-tree  is  used.  ^ 

It  is  the  object  of  the  recognition  system  to  utilize  the  output  of  the 
scanner  (or  any  transformation  unit  between  source  and  input)  and  through  a 
series  of  transformations  or  operations  on  these  data,  identify  the  original 
pattern.  The  output  of  this  system  would  be  a  code  emblematical  of  the 
unknown  character. 

2.  THEORETICAL  BASIS 

Ideally,  for  testing  character- recognition  devices,  a  statistically  signifi¬ 
cant  set  of  samples  representing  all  of  the  conceivable  inputs  should  be 
available.  These  samples  should  represent  errors  produced  in  the  original 
typing  or  printing,  errors  in  the  centering  of  the  image,  errors  produced  by 
the  scanner,  etc.  Several  routines  have  been  developed,  the  use  of  which 
(separately  or  in  various  combinations)  attempts  to  adequately  simulate  many 
of  the  conditions  described  above.  To  facilitate  the  simulation  of  these  con¬ 
ditions  as  well  as  to  allow  for  greater  flexibility,  the  percentage  of  distortion 
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and  the  resolution  are  treated  as  parameters.  Also  arbitrary  is  the  use  of 
Binary  or  BCD  inputs  and  outputs. 

An  ideal  set  of  characters  was  quantized  with  scanning  assumed  to  be 
vertical  starting  at  the  upper  left  of  the  character.  The  digital  representation 
resulting  from  the  quantizing  is  then  placed  on  IBM  cards  with  each  card 
taking  one  scan.  Initial  positions  on  the  card  were  used  to  designate  the 
character.  For  instance,  columns  were  used  to  identify  whether  the  character 
was  alphabetic,  numerical,  symbolic,  or  punctuation,  what  the  origin  set  was, 
and  which  image  of  the  set  is  partially  contained  on  the  card.  A  final  column 
was  used  to  denote  the  sample  number.  One  sample  consists  of  a  complete 
set  of  characters  from  the  alpha-numeric  set  in  question.  The  number  of 
samples  tells  the  routines  how  many  total  characters  will  be  read  in,  thus 
enabling  the  computer  to  test  for  the  last  sample.  Columns  were  also  used 
to  denote  the  scan  number.  The  rest  of  the  card  is  available  for  the  scan. 

Percentage  random  unbiased  distortion  is  defined  as  the  ratio  of  the 
number  of  cells  changed  to  the  number  of  cells  scanned.  Formally,  for  a 
30  X  32  resolution  where  15  percent  random  unbiased  distortion  is  desired: 

0.  15  =  - 2 - 

(30)  (32) 

X  equals  144  which  means  that  144  cells  will  be  selected  at  random  and  their 
contents  changed.  Fractions  are  truncated.  It  is  obvious,  therefore,  that  to 
achieve  the  distortion  of  individual  cells  or  bits  of  the  quantized  character, 
we  must  be  able  to  address  them  individually.  The  data  cards  representing 
the  resolutions  of  the  individual  characters  are  read  into  the  computer 
according  to  a  format  which  causes  the  resolution  to  be  treated  as  a  double 
subscripted  variable.  This  allows  each  cell  to  be  stored  in  the  computer  as 
a  separate  word.  To  change  a  bit,  we  merely  have  to  "call  in"  the  word  and 
test  its  contents,  and  set  it  equal  to  the  opposite  condition. 

The  arbitrary  parameters  are  read  in  to  determine  resolution  dimensions, 
percentage  distortion,  number  of  samples,  output  tapes,  and  mode  of  output 
(binary  or  BCD).  If  binary  output  is  used,  it  may  be  desirable  to  get  a  listing 
of  one  sample  to  be  assured  that  the  results  are  as  intended.  The  parameters 
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are  read  in,  the  characters  stored,  and  the  number  of  cells  to  be  changed  is 
computed.  Loops  are  set  up  to  provide  the  desired  quantity  of  output  and  the 
distortion  is  performed.  Two  random  numbers  are  "called  in"  (magnitude 
between  0  and  1  )  and  each  is  multiplied  by  a  resolution  dimension.  The 
two  resulting  numbers  can  then  be  used  as  subscripts  to  call  in  the  cell  to  be 
changed.  Since  the  characters  are  made  up  of  O's  and  9's  (representing 
absence  and  presence  of  the  original  image,  respectively),  a  test  is  made  to 
determine  whether  the  cell  contains  a  0  or  a  9  and  the  cell  is  changed 
accordingly.  The  choice  of  O's  and  9's  was  arbitrary.  The  output  is  then 
recorded  and  the  program  continues  to  completion. 

This  routine  is  an  attempt  to  simulate  the  random  distortion  which  might 
be  caused  by  an  inefficient  scanner,  filter,  or  smoother.  (A  filtering  unit 
and  a  smoothing  unit  are  two  devices  which,  in  practical  applications,  might 
precede  the  recognition  device  and  filter  out  extraneous  noise  and  smooth 
over  straight  lines  and  curved  sections.) 

The  displacement  routine  performs  random  shifting.  The  input  parameters 
are  resolution  size,  number  of  samples,  input  and  output  tape  unit  numbers, 
binary  or  BCD  model  control,  and  maximum  linear  shift.  This  routine  per¬ 
forms  four  shifts,  one  in  each  direction  of  a  random  number  of  cells  (less 
than  an  arbitrary  maximum).  The  resulting  image  has  been  shifted,  at  least 
partially,  into  one  of  the  four  quadrants.  The  empty  cells  have  been  left 
blank.  This  routine  is  an  attempt  to  simulate  conditions  which  may  be  caused 
if  the  printer  or  scanner  is  off  center. 

One  of  the  most  important  requirements  of  these  distortion  subroutines 
is  that  they  adequately  simulate  the  conditions  which  may  be  imposed  by 
inefficient  or  inadequate  devices  which  would  precede  the  character  recogni¬ 
tion  system.  This  must  be  accomplished  with  no  knowledge  as  to  what  the 
original  image  was,  where  it  was,  or,  in  some  cases,  what  form  of  type  was 
used  in  its  printing.  The  most  probable  source  of  distortion  is  the  worn 
typewriter  ribbon.  The  random  biased  distortion  routine  attempts  to  simulate 
this  condition. 

Referring  to  figure  1,  we  can  see  that  if  a  section  of  a  typewriter  ribbon 
has  been  used  consistently  to  type  such  characters  as  P,  H,  t.  A,  B,  R,  E, 
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F,  and  1,  the  central  part  of  the  ribbon  could  conceivably  become  worn  and 
produce  a  G  which  resembled  a  C  as  Is  shown.  Such  an  example  may  be 
extreme;  however  It  serves  to  demonstrate  the  objectives  of  this  routine. 

The  routine  randomly  selects  and  deletes  entire  rows  and  columns  of  the 
quantized  character.  To  accomplish  this,  a  random  number  Is  used  to  select 
between  row  deletion  and  column  deletion.  A  second  random  number  chooses 
the  row  (or  column)  to  be  deleted.  If  we  define  percentage  random  biased 
distortion  In  a  similar  manner  as  we  did  unbiased  distortion,  then  for  a 
30  X  32  quantization,  with  25  percent  random  biased  distortion: 


0.25  = 


X 

30  +  32 


X  =  15.  5 


With  a  total  of  62  rows  and  columns,  (0.  25)  (62),  or  15  deletions  must  be 
made  for  25  percent  random  biased  distortion  (fractions  are  again  truncated). 
The  deletions  are  accomplished  in  the  IBM  7090  by  equating  each  element  in 
the  selected  row  (or  column)  of  the  character  to  zero  (which  represents 
ebsence  of  image). 

The  definitions  of  the  two  types  of  random  distortion  are  independent  of  the 
type  style  as  well  as  of  initial  quality.  For  a  given  resolution  and  a  fixed 
amount  of  distortion  or  deterioration,  lower  case  letters  and  capital  letters 
are  affected  in  accordance  with  the  respective  space  covered  by  each.  To 
demonstrate  this  advantage,  let  us  assume  that  we  have  a  bit  counter  following 
the  scanner  and  define  percentage  random  unbiased  distortion  as  the  ratio  of 
cells  changed  to  the  total  number  of  I's  produced  (assuming  that  the  quantiza¬ 
tion  process  consisted  of  I's  and  O's).  This  causes  distortion  to  become  a 
function  of  the  size  of  the  image  rather  than  of  the  size  of  the  area  scanned 
and  the  resolution.  If  we  have  high  resolution  (a  large  number  of  cells  per 
unit  character  area)  and  a  small  original  image,  we  could,  by  this  new 
definition,  achieve  100  percent  deterioration  and  not  change  a  single  cell  of 
the  original  image.  It  is  also  noteworthy  that  we  could  not  distort  a  space 
with  this  definition. .  .  a  procedure  which  is  desirable  for  testing  threshold 
values.  It  is  necessary  that  distortion  be  defined  as  a  function  of  resolution 
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size  with  no  dependence  upon  I's  count  or  size  of  the  scanned  image. 

An  attempt  was  made  in  this  study  to  provide  a  program  which  would 
rotate  the  character  a  random  number  of  degrees.  The  brute  force  method 
of  accomplishing  this  would  be  to  sketch  out  on  paper  an  actual  rotation  for 
each  degree  (clockwise  as  well  as  counterclockwise)  and  have  the  result  pro¬ 
gramed  for  each  case.  This  would  require  an  instruction  per  cell,  per  degree. 
Such  a  procedure  would  mean  a  program  of  approximately  50,  000  instructions 
for  a  50  X  50  quantization  (50  x  50  =  2500  instructions  per  degree,  25,  000  for 
a  maximum  of  only  10  degrees  in  one  direction,  and  50,  000  for  a  maximum  of 
10  degrees  in  both  directions).  This  method  was  discarded  for  obvious  reasons. 

It  must  be  realized  that,  to  be  practical,  a  rotation  routine  should  be  com¬ 
patible  with  the  other  routines  developed  with  this  study.  We  therefore  must 
start  and  end  with  the  rectangular  type  of  representation  of  the  characters. 

Each  character,  before  being  shifted,  possesses  a  certain  information  content. 
Unless  the  character  is  shifted  much  beyond  the  confines  of  the  quantization, 
the  information  content  will  remain  approximately  the  same.  In  circular 
shifting  (or  rotation)  cells  are  moved  in  directions  not  normal  to  one  of  their 
sides.  Some  cells  will  therefore  not  fall  completely  into  a  new  cell  (as  in 
linear  shifting)  and  the  information  content  of  the  character  will  consequently 
be  diminished.  To  preserve  the  information  content  of  the  characters,  a 
circular  representation  was  established  which  provides  for  each  cell. 

It  is  noted  that  in  the  previously  discussed  shift,  the  character  was  treated 
as  having  rectangular  coordinates,  and  only  one  coordinate  was  changed  in 
performing  the  shift.  An  attempt  was  made,  for  rotation,  to  transform  the 
rectangular  representation  to  a  circular  representation  and  similarly  change 
only  one  coordinate  (the  "angle"),  and  then  transform  back  to  the  rectangular 
representation.  A  50  x  50  quantization  was  selected  and  rotation  was  performed 
about  the  midpoint.  The  original  quantized  character  (being,  in  most  cases, 
smaller  than  50  x  50)  can  be  placed  in  any  part  of  the  50  x  50  mesh  and  thereby 
rotated  about  points  other  than  the  center.  The  changing  of  coordinate  systems 
was  accomplished  by  setting  up  "vectors"  from  the  center.  The  quantization 
was  viewed  as  being  a  sequence  of  concentric  squares.  The  first  "circle  of 
squares"  contained  4  squares  numbered  consecutively  starting  at  the  upper 
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left  corner.  This  procedure  was  continued  through  vector  25,  which  contained 
196  squares.  When  the  rotation  angle  was  selected,  the  squares  of  each 
vector  were  rotated  (i.  e. ,  clockwise)  according  to  the  formula: 

^~360~  ^  "  number  of  squares  shifted  in  particular  vector 

where  Y  is  the  number  of  squares  in  the  vector  in  question  and  x  is  the 
number  of  degrees  of  rotation.  The  parameters  available  in  the  various 
sections  of  the  routine  provide  for  (1)  the  optional  print-out  of  the  character 
location  in  the  50  x  50  matrix  (in  case  center  rotation  is  not  desired),  (2)  card 
or  tape  input,  (3)  binary  or  BCD  input,  (4)  a  controlled  or  random  amount  of 
rotation,  (5)  controlled  or  random  direction  of  rotation,  (6)  choice  of  maximum 
rotation,  (7)  optional  print-out  of  results,  and  (8)  binary  or  BCD  output. 

Unfortunately,  this  program  provides  more  than  was  originally  intended. 
While  working  perfectly  for  angles  of  0,  90,  180,  270,  and  360  degrees  <in 
either  direction),  it  produced  distortion  at  angles  between  these.  It  is 
possible  that  this  program,  in  its  present  form,  may  be  useful  for  generating 
samples  to  test  cursive  handwriting.  While  this  would  be  for  handwritten 
character  recognition,  an  undistorted  rotation  is  still  desirable  for  printed 
images.  Work  is  continuing  in  an  effort  to  develop  an  acceptable  code  for 
"pure”  rotation.  One  possible  approach  which  is  being  considered  is  that  of 
viewing  the  quantized  character  as  a  mapping,  and  developing  a  transformation 
"multiplier"  which  would  rotate  the  character  as  desired. 

3.  EXPERIMENTAL  RESULTS 

Before  demonstrating  the  application  of  the  distorted  characters,  it  is 
first  necessary  to  briefly  discuss  the  particular  character- recognition  program 
used.  The  basic  program  was  written  by  Breuer  and  recognition  is  based 
upon  probabilistic  comparisons.  A  set  of  "known"  characters  is  used  to 
establish  frequency  distributions  of  element  values.  Each  cell  is  assumed  to 
be  independent  of  other  cells;  however,  the  probability  of  a  given  character's 
occurring  becomes  either  less  or  greater  with  the  knowledge  of  the  contents 
of  each  new  cell  examined.  A  probability  matrix  is  thereby  set  up  for  each 
character  in  the  set. 
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As  each  unknown  is  introduced  to  the  program,  it  is  possible,  on  a  cell- 
by-cell  basis,  to  compute  its  "score-of-match"  with  respect  to  each  character 
of  the  set.  A  "best-fit"  is  used  for  decision  making  in  this  particular  program; 
however,  a  threshold  could  be  incorporated  if  desired. 

To  demonstrate  the  application  of  the  distorted  characters,  this  report 
will  discuss  only  the  recognition  test  in  which  the  random  unbiased  distortion 
characters  were  used.  Since  these  programs  are  completely  compatible  with 
each  other,  the  usage  is  identical  for  all  routines. A  set  of  ten  similar 
characters  was  used.  A  sample  "perfect"  character  is  shown  in  figure  2.  A 
discussion  of  some  of  the  tests  run  and  their  results  follows. 

a.  Random  unbiased  distortion  —  3  percent 

The  3-percent  figure  was  chosen  as  a  low  value  which  would  provide 
sufficient  distortion  to  test  a  recognition  routine  yet  not  enough  to  exceed  the 
recognition  efficiency  of  the  routine.  A  sample  with  3  percent  distort  is 
shown  in  figure  3.  The  figure  is  outlined,  as  are  the  individual  cells  whose 
contents  have  been  changed.  In  a  30  x  32  quantization,  3  percent  random  un¬ 
biased  distortion  will  alter  28  cells. 

b.  Recognition  -  unbiased  distortion 

An  ideal  character- recognition  scheme  would  be  capable  of  the 
correct  identification  of  a  pattern  as  long  as  it  is  recognizable  by  the  human 
eye.  This,  of  course,  is  a  grossly  unrealistic  goal,  because  of  the  innate 
ability  of  the  human  being  to  include  prior  context  and  concurrent  stimuli 
along  with  visual  acuity.  Also,  when  a  pattern  is  distorted  beyond  human 
recognition  we  would  like  the  ideal  scheme  to  "reject"  the  pattern  as  non- 
recognizable.  The  ability  of  the  character  recognition  scheme  to  correctly 
identify  a  pattern  should  therefore  parallel,  as  close  as  possible,  that  of  a 
human  being. 

It  can  be  seen  from  figure  3  that  patterns  with  3  percent  random 


*A  listing  of  any  of  the  programs  is  available  from  the  author. 
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unbiased  distortion  are  still  easily  recognizable.  We  therefore  should  expect 
that  a  character  recognition  routine  would  have  100  percent  efficiency  when 
attempting  to  recognize  these  characters.  When  these  characters  were  ex¬ 
amined  by  the  recognition  routine,  correct  recognition  was  achieved  in  each 
case  by  a  considerable  margin.  At  this  low  level  of  distortion  the  recognition 
routine  produced  the  desired  results. 

Recognition  of  patterns  with  up  to  20  percent  random  unbiased  dis¬ 
tortion  was  also  100  percent  effective.  The  theoretical  goal  of  a  recognition 
program,  as  pointed  out  previously,  is  to  accurately  identify  patterns  with 
100  percent  efficiency  until  distortion  is  increased  to  the  point  where  such 
efficiency  by  a  human  being  is  impossible.  It  is  interesting  to  note  that  while 
the  character  shown  in  figure  4  with  20  percent  random  unbiased  distortion, 
is  recognizable  to  a  human  being,  (only  after  a  few  seconds  of  concentration 
before  outlining),  the  routine  was  able  to  effect  correct  identification  by  a 
considerable  margin. 

While  20  percent  is  probably  above  the  maximum  amount  of  distortion 
of  this  type  that  would  be  encountered  in  a  practical  situation,  this  author  was 
curious  to  see  if  the  crossover  point  from  100  percent  efficiency  to  something 
less  would  be  the  same  for  both  the  routine  and  the  human  being.  Figure  5 
shows  a  picture  of  a  character  distorted  with  30  percent  random  unbiased  dis¬ 
tortion.  As  the  reader  will  note,  recognition  by  a  human  being  is  possible  only 
after  close  study  of  the  character  (again,  before  outlining).  It  is  reasonable 
to  assume  that  the  crossover  point  for  the  human  being  would  be  passed  with 
a  slight  increase  in  distortion.  While  100  percent  effectiveness  did  occur 
with  these  characters,  the  margin  of  correct  choice  was  quite  small.  The 
effectiveness  of  this  recognition  scheme  in  handling  this  particular  type  of 
distortion  is  clearly  indicated  in  this  brief  experiment. 

c.  Random  shifting  —  3  percent  distorted  characters 

The  random  shift  routine  shifts  the  pattern  a  maximum  of  six  cells 
in  each  of  the  four  directions  (up,  down,  left,  and  right).  Six  was  arbitrarily 
selected  as  providing  an  adequate  shift  for  testing  out  the  recognition  routine 
with  a  modification  for  centering.  Figure  6  shows  the  results  of  the  random 
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shifting.  The  characters  used  were  the  3  percent  distorted  characters  from 
the  previous  experiments. 

d.  RecflgniUon  ?-  ihiited.chajrafiteri. 

When  the  shifted  characters  were  presented  to  the  recognition  routine, 
an  efficiency  of  approximately  72  percent  was  achieved.  It  should  be  pointed 
out  that  in  this  experiment  an  attempt  was  made  to  recenter  the  known 
character  before  undertaking  recognition  of  the  unknown.  The  unknown  char¬ 
acters  in  this  experiment  were  not  recentered.  An  improvement  in  efficiency 
could  possibly  be  effected  by  recentering  of  both  known  and  unknown  characters; 
however,  such  tests  have  yet  to  be  performed. 

e.  Random  biased  distortion/ recognition 

Figure  7  shows  a  character  which  has  received  20  percent  random 
biased  distortion.  With  a  30  x  32  quantization  this  results  in  the  deletion  of 
12  rows  and/or  columns.  While  relatively  large  sections  of  the  character 
were  deleted  in  some  cases,  recognition  is  still  an  easy  matter  if  done  visually. 
When  the  recognition  routine  was  used,  the  effectiveness  was  100  percent. 

f.  Rotation 

•  • 

To  show  rotation,  the  Cyrillic  letter  £  was  chosen  because  of  the 
two  dots  at  the  top  of  the  letter.  Looking  at  the  four  rotations  shown  in 
figure  8,  one  can  possibly  use  the  positions  of  the  dots  to  better  visualize  the 
onount  of  rotation  imposed  upon  the  main  body  of  the  original  image.  The 
figure  shows  5  degrees  counterclockwise,  15  degrees  clockwise,  45  degrees 
counterclockwise,  and  90  degrees  clockwise,  respectively.  The  distortion 
mentioned  in  the  discussion  of  this  program  becomes  highly  evident  for  angles 
of  rotation  in  excess  of  about  10  degrees.  As  this  method  views  the  quantization 
as  a  sequence  of  concentric  squares,  the  rotation  performed  is  actually  a 
combination  of  vertical  and  horizontal  shifting  of  the  cells.  Since  circles  have 
no  edges  or  sides  such  a  procedure  is  bound  to  cause  distortion. 
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Figure  8c  Figure  8d 


TDR-63>44 


4.  CONC-LVglQN 

The  real  test  of  a  character  recognition  device  begins  when  it  must 
identify  not  only  ideal  figures  but  also  disturbed  figures  as  they  will  occur  in 
practical  application.  The  need  for  programs  to  produce  "artificial"  characters 
is  apparent.  It  is  hoped  that  the  programs  developed  in  the  course  of  this  study 
will  help  to  alleviate  this  need. 
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